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ABSTRACT 



This paper examines 17 national datasets that are available 
for policy studies and research about college faculty. The datasets include 
11 containing faculty information, two about student enrollment, two about 
degrees awarded, and two about institutional activity. Each of the following 
datasets is individually described: (1) National Science Foundation -National 

Institutes of Health Survey of Graduate Students and Postdoctorates in 
Science and Engineering; (2) Survey of Earned Doctorates; (3) Survey of 
Doctorate Recipients; (4) National Survey of Recent College Graduates; (5) 
National Survey of College Graduates; (6) National Center for Education 
Statistics (NCES) Integrated Postsecondary Education Data System (IPEDS) 
Survey of Earned Degrees; (7) NCES IPEDS Survey of Salaries, Tenure, and 
Fringe Benefits of Full-Time Instructional Faculty; (8) NCES IPEDS Fall Staff 
Survey; (9) College and University Personnel Association National Faculty 
Salary Survey by Discipline and Rank; (10) Oklahoma State Faculty Salary 
Survey; (11) National Study of Postsecondary Faculty; (12) Higher Education 
Research Institute Faculty Survey; (13) Doctoral Program Rankings, 1995; (14) 

American Association of University Professors Faculty Compensation Survey; 

(15) NCES IPEDS Fall Enrollment Survey; (16) NCES IPEDS Institutional 
Characteristics Survey; and (17) NCES IPEDS Finance Survey. World Wide Web 
addresses are provided for datasets when available. (DB) 
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time and the support of AIR, NCES, and NSF staff which 
made this review of the national datasets possible. 

Introduction 

Numerous policy issues arise at the institutional, state, 
and national level which may be addressed with data 
about faculty. While in the past, it has been difficult if not 
impossible to gather data to support this research, the 
advent of the World Wide Web has transformed the 
dissemination and diffusion of the national datasets. In 
particular, the National Center for Education Statistics (NCES) 
and the National Science Foundation (NSF) have taken 
significant steps to make the data they collect available on 
the Web in a readily-accessible format for analysis. 

The purpose of this AIR Professional File article is to 
document the national datasets which may be used for 
policy studies and research about faculty. These include 1 1 
datasets which include faculty information (the IPEDS S, 
IPEDSSA, CUPA, Oklahoma State, AAUP, NSOPF, HERI, 
SDR, NRC, NSCG, and the NSRCG); two datasets about 
student enrollment (the IPEDS EF and the GSS); two 
datasets about degrees awarded (the IPEDS C and the 
SED); and two datasets about institutional activity (the 
IPEDS F and IPEDS 1C). 

An extensive review of each dataset is provided. This 
includes a discussion of the nature of each survey and 
examples of how the data may be used for faculty studies. 
The review also describes whether each dataset is based 
on a population or sample survey, key variables, the 
administering agency, response rates, where the data 
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may be obtained, what historical data are available, and 
the most current data available. Most of the discussion 
focuses on the datasets with information about faculty or 
potential faculty. The six non-faculty datasets are analyzed 
in terms of how they may be used in conjunction with the 
other 1 1 for purposes such as calculating performance 
measures. 

When examining the data, it is helpful to think of using 
different lenses for different kinds of analysis. Most of the 
datasets may be used for peer comparison of specific 
institutions and these serve as important resources for 
institutional research. It is also important to think about 
ways to aggregate the data by Carnegie classification 
and/or control. Regional issues such as the impact of cost 
of living on faculty salaries may be addressed with location, 
state, and zip code fields. At the national level, patterns 
of faculty workload, salary compression, and access may 
be discerned. 

Caveats 

It is important to understand certain caveats about 
how the data were collected and how the data should be 
used. For example, the IPEDS datasets on full-time 
instructional faculty salaries (SA) do not include data for 
survey cells or items for which there are three or fewer 
faculty. NCES does this to safeguard privacy by 
preventing the possible identification of individuals. The 
results of average salary calculations will be different for 
records in which this is the case. In the data administration 
and dissemination of each dataset, many such decisions 
are made and it is critical that users carefully read the 
field definitions and instrument collection instructions. 
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Another element to consider with each file is the census 
date, especially when merging files that are presumed to 
be of the same year or semester. 

All of the datasets are based on headcount. Nowhere 
is the variable of full-time equivalent (FTE) faculty 
collected. Faculty FTE are often assumed, using the full- 
time headcount of the IPEDS SA or Fall Staff Survey (S). 
However, definitions of full-time faculty vary between 
institutions and between individuals. Some surveys 
include faculty on leave, while others do not. How did 
institutions determine whether faculty are teaching, 
research, or service? This requires examination of 
multiple funding records in which faculty are paid from 
different accounts for different purposes. 

Some survey data are weighted back to the universe 
of institutions as documented by NCES, some are 
weighted to population estimates from the Decennial 
U.S. Census, and others are weighted to surveys of the 
entire population (for example research doctorates from 
the Survey of Earned Doctorates). It is important to 
examine sample sizes, response rates, and stratification 
procedures. 

Much has been written about peer comparisons and 
the reader is referred to this literature for better discussion 
of caveats in using data for this purpose (Brinkman and 
Teeter, 1 987). One word of caution is that users of these 
data need to look for anomalies and outliers. Does a 
suggested pattern such as low expenditures for instruction 
per faculty FTE show up in other variables, such as 
expenditures for libraries? Does the same pattern show 
up in previous years? 

When working with data dictionaries to understand the 
structure of each survey, it is helpful to have a copy of the 
actual questionnaire in hand. It is even better to have a 
copy of an individual institution’s survey submission, in 
order to correctly match up field names with data cells on 
the survey form. While great time is saved with electronic 
access to the data, sometimes it is more efficient, timely, 
and cheaper to obtain print copies of surveys, such as 
institutional submissions to the CUPA or IPEDS 1C 
surveys. These may then be collated within a spreadsheet 
for comparisons of peers and competitors on chosen 
variables. This is especially necessary when the data 
are not available with institutional identifiers. 

Some other critical questions to ask: How are missing 
submissions and variable items imputed? What is the 
imputation method? Should imputed data be used if 
available or should the previous year’s data be 
substituted? What is the disciplinary taxonomy used and 
how does it relate to the discipline structures of interest? 
It is often useful to mirror the disciplinary mix at an 
institution by weighting comparison data. What kind of 
taxonomies are available for discipline and institutional 
type fields? 

Who prepared the submission? This is a particularly 



vexing and often hidden problem. Some schools have 
a well-staffed institutional research office which is involved 
in peer comparisons and very aware of the need for 
clean data. Sometimes schools rely on human resources 
offices to complete personnel-related surveys and staff 
may not be aware of ways in which aggregate data are 
used. 

The interpretation of the survey instructions may be 
different depending on who completed it. For example, 
in completing the NSF-NIH Science and Engineering 
Graduate Student Survey (GSS), some schools gather 
the data centrally while others send it to departments to 
complete. Discussions held by the author with the vendor, 
Quantum Research Corporation, NSF staff, and 
institutional researchers suggest that very different results 
are obtained with each method of collection. Departments 
will count postdoctoral fellows that do not appear in the 
human resource payroll files that institutional research 
offices would use to complete the survey centrally. 
Departments may count students who are not actually 
enrolled in the semester of the census date or for whom 
they have only an advising load. 

Many of the surveys have undergone extensive 
changes over time, making historical comparisons at 
times impossible. Yet the field names may remain similar, 
leading the casual user to think that the data may be 
used in this way. Copies of some early survey instruments 
are available on the Internet, though others are not 
available. Users must read carefully about changes in 
the instrument and in the collection effort. 

The NCES is working to provide better data for decision- 
making. In 1994, Congress authorized the creation of the 
National Postsecondary Education Cooperative (NPEC). 
NPEC's mission is “to identify and communicate on-going 
and emerging issues germane to postsecondary education, 
and to promote the quality, comparability and utility of 
postsecondary data and information that support policy 
development, implementation, and evaluation” (NPEC, 
http://nces.ed.gov/npec/). All levels of postsecondary 
education are included in NPEC activities, along with 
statewide governing and coordinating agencies, federal 
agencies, and national higher education associations. 

In the aftermath of the National Commission on the 
Cost of Higher Education report to Congress, “Straight 
Talk About College Costs and Prices,” and the Higher 
Education Reauthorization Act of 1 998, NCES is working 
to redesign the IPEDS surveys. NCES has an internal 
task force and is building a national dialogue about ways 
in which the data are used and collected. NCES is also 
working with four IPEDS redesign subcommittees of the 
NPEC to focus on finance, faculty/staff, student, and 
survey population/sample issues. It is clear that many of 
the IPEDS forms will change dramatically in the next 
several years and users of the national data are urged to 
follow and participate in this dialogue. The reports of the 
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task force and working groups are available on the Web 
at http://nces.ed.gov/lpeds/whatsnew.html. 

When working with these datasets, it is necessary to 
recognize the difference between the kind of descriptive 
statistics used for most institutional research and more 
sophisticated methods of quantitative analysis. Though 
software programs such as SAS, SPSS, or Access are 
used for reading, merging, and re-coding the data, many of 
the policy analyses described in this paper require only 
simple cross-tabs or pivot table calculations. While the 
National Study of Postsecondary Faculty (NSOPF) and 
other surveys may be used for complex studies, such as 
faculty life and research productivity by discipline, the focus 
of this paper is on more pragmatic, policy analysis. With this 
approach to descriptive statistics, the results may be used 
only to suggest the presence of a possible pattern in the 
data, not for any kind of generalization. 

Review of Datasets 

The following table (also Appendix A) lists the data sets 
and their availability in various formats. In addition to the 
primary Websites (WebCASPAR, SESTAT, and IPEDS), 
the table documents whether files are available by FTP 
download; if there is any cost associated with obtaining the 



data; whether they are available in print or CD format; if a 
user microdata license is required; if the data are 
commercially available; and if the data are readily available 
in Web database applications. 

(1) NSF-NIH Survey of Graduate Students and 
Postdoctorates in Science and Engineering 
(GSS) 

This survey has been conducted in some form since 
1966 by the National Science Foundation, in cooperation 
since 1973 with the National Institutes of Health. 
Information about the GSS is available on the Web at 
http://www.nsf.gov/sbe/srs/gss/. 

This survey documents graduate enrollment and 
financial support for graduate students. The GSS is the 
“only nationally representative data bank on sources of 
support of graduate science and engineering (S&E) 
students and their enrollment characteristics, and on 
S&E postdoctoral appointments" (Guide, 1998). 
Institutional aggregate totals of discipline-specific data 
are collected for full- and part-time students by gender 
within ethnicity and by funding source, and for post-docs 
and other nonfaculty, doctoral, research staff. 



Table 1: Datasets by Availability 



Datasets 


Source 


WebCASPAR 


IPEDS 

Interactive 


SESTAT 


FTP 


Free 


Print 


CD 


License 


For 

Sale 


Other Web 


Faculty 
























S - IPEDS 


NCES 








X 


X 


X 


X 




X 




SA - IPEDS 


NCES 


X 


X 




X 


X 


X 


X 




X 


X 


CUPA 


CUPA 












X 






X 




OK 


OK 












X 






X 




AAUP 


AAUP 












X 






X 




NSOPF 


NCES 










X 


X 


X 


X 




X 


HERI 


HERI 












X 






X 




SDR 


NSF 






X 




X 


X 




X 






NRC 


NRC 


X 










X 


X 






X 


NSCG 


NSF 






X 




X 


X 




X 






NSRCG 


NSF 






X 




X 


X 




X 






























Enrollment 
























EF - IPEDS 


NCES 


X 


X 




X 


X 


X 


X 




X 


X 


GSS 


NSF 


X 






X 


X 


X 


































Degrees 
























SED 


NSF 


X 








X 


X 




X 






C - IPEDS 


NCES 


X 


X 




X 


X 


X 


X 




X 


X 


























Institutional 

activity 
























IC - IPEDS 


NCES 


X 






X 


X 


X 


X 




X 




F - IPEDS 


NCES 


X 






X 


X 


X 


X 




X 
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Some critical variables have changed over the years, 
but the survey offers consistent data about enrollment at 
the program and/or departmental level. The entire universe 
of graduate programs in S&E has been surveyed since 
1988. In 1997, data on 11,597 departments at 601 
institutions were collected, with a 98.3% response rate. 
Data for non-respondents requires complete imputation 
from previous data where available or from peer institutions. 
Approximately 14.4% of all respondents had one or more 
variables imputed. Data from 1 966-74 are not comparable 
with later years. The 1997 data are the most current 
available. 

GSS data are released annually in the NSF publication 
Graduate Students and Postdoctorates in Science and 
Engineering and included in other publications, such as 
Science and Engineering Indicators and Women. 
Minorities, and Persons with Disabilities in Science and 
Engineering . These and other NSF publications are 
available online in HTML and Adobe PDF format. 

In terms of institution-specific reporting, the Website 
Academic Institutional Profiles includes rankings and trend data 
by clusters of discipline (http://www.nsf.gov/sbe/srs/profiles/). 
This includes data from all of the NSF datasets, including 
GSS reports about: (1) the characteristics of full- and part- 
time students; (2) full-time graduate S&E students receiving 
primary support from federal sources by field; (3) full-time 
graduate S&E students receiving primary support from 
federal sources by type and primary source of support; (4) 
characteristics of postdoctorates; and (5) characteristics 
of federally-supported postdoctorates. 

Public use data files from 1972 to 1995 are available 
for FTP download from the Internet in AASCII format. See 
http://www.nsf.gov/sbe/srs/gss/95dug/start.htm for more 
information. The GSS questionnaire is available in 
graphical (GIF) format for viewing. While no SAS or SPSS 
programs are publicly available to help users read the 
files, the complex record layout is documented. 

In many ways, it is no longer necessary to use the 
public use data files because the data are available in a 
much more manageable format with WebCASPAR. 
CASPAR (Computer-Aided Science Policy Analysis and 
Research) was originally developed by the National 
Science Foundation (NSF) and Quantum Research 
Corporation (QRC) as a CD-Rom product and has since 
been migrated to a Web browser-based software tool 
(http://caspar.nsf.gov/webcaspar). Described as “Your 
Virtual Bookshelf of Statistical Academic Data,” 
WebCASPAR makes data from NSF, NCES, and the 
National Research Council’s (NRC) Research-Doctorate 
Program Ratings available online to researchers and policy 
analysts. 

WebCASPAR offers numerous features to aid users, 
including: (1) highlights of what’s new on the Website; (2) 
a data map of sources; (3) tutorials; (4) predefined reports; 
(5) various ways to retrieve data, such as by institution or 
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multiple sources; (6) visual diagrams of cross-tabs; (7) 
personalized options, such as creating special groups of 
institutions and the ability to save and run individualized 
reports; and (8) the ability to save results in Excel, Lotus, 
HTML, and SAS read file formats. Currently, data from 
the GSS are available in WebCASPAR for 1972 to 1996. 

The disciplinary taxonomy used to collect data by 
discipline/department is unique to the GSS and includes 
an exhaustive breakout on health fields. A lookup table is 
available on WebCASPAR, which rolls the disciplines in 
the GSS into the 59 possible combinations of unique 
CASPAR discipline clusters. The CASPAR discipline 
taxonomy, while thorough, does discard some disciplinary 
distinctions. However, WebCASPAR has evolved to offer, 
in many cases, both the survey taxonomy and its own 
proprietary system. 

Data on some social science programs not always 
associated with science and engineering are included in 
the GSS, among them psychology, economics, 
anthropology, geography, political science, public 
administration, linguistics, and sociology. The focus of the 
survey on science and engineering does not permit data 
collection on other disciplines. The GSS is the primary 
data source for information about S&E post-doctoral 
appointments. However, postdoc data for people employed 
outside of academe, such as at national labs, must be 
gathered from other sources. 

The GSS is the best source for gathering discipline- 
specific S&E graduate enrollment trends at the institutional 
level, broken out by gender, ethnicity, full-/part-time, and 
funding status. Comparable graduate enrollment data 
are gathered by two other surveys, the CGS-GRE Survey 
of Graduate Enrollment and Peterson’s Annual Survey of 
Graduate Institutions. Next to CUPA and the SDR, the 
GSS is the critical source for documenting, however 
incomplete, the reported population of S&E postdocs and 
for the nebulous category of “non-faculty research staff.” 

These data provide a large view of the complex industry 
of S&E knowledge production, including important access 
and funding issues. It is possible to construct models of 
the pipeline of potential faculty for supply and demand 
studies and to analyze shifting funding patterns for graduate 
education. Researchers may use the GSS to construct 
prestige rankings by program based on enrollment and 
funding, comparable to the NRC rankings. 

(2) Survey of Earned Doctorates (SED) 

This survey is sponsored by NSF and four other federal 
agencies and is designed to collect data about the number 
and characteristics of doctoral recipients from U.S. 
institutions. The SED data are available as part of the 
Doctorate Records File (DRF) project, which documents 
all persons receiving research doctorates in the U.S. 
since 1957. The DRF also contains limited data on 
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doctorate recipients from 1920-1956. The SED survey 
does not include professional degrees such as the M.D. 
Information about the SED is available on the Web at 
http ://www. nsf . gov/s be/s rs/ssed/. 

Since 1997, the SED has been collected by the National 
Opinion Research Center. Prior to this, the National 
Research Council’s (NRC) Office of Scientific and 
Engineering Personnel conducted the survey. 

Of 42,415 new research doctorates in 390 institutions 
in 1996, there was a 93% response rate. The survey is 
usually considered a requirement of graduation 
paperwork. Records for non-respondents are created 
from commencement lists and other sources. No 
imputation methods are used for missing data items and 
item non-response rates range from 0.4% for gender to 
4.6% for race. Aggregated to the institution level, these 
data compare well to the IPEDS Degrees Completions 
survey collected by NCES, which will be discussed later. 
Where the Completions survey documents all doctoral 
awards and is reported by institutions, the SED surveys 
all research doctorates and is completed by individuals. 

The key variables in the SED include: academic 
institution, citizenship, country of birth, country of 
citizenship, birth date, disability status, educational 
attainment of parents, educational history, enrollment 
status (full-/part-time), field of degree, field of employment, 
field of science and engineering, field of study, level of 
degree, marital status, number of dependents, birth place 
(within U.S.), postgraduate plans, primary source of 
financial support (e.g., NSF, NIH, etc.), race and ethnicity, 
gender, type of academic institution (historically black/ 
others), type of employer planned, type of financial support 
(e.g., fellowship, research assistantship, etc.), type of 
institutional control (public versus private), and work 
activity planned. 

The three-digit taxonomy of disciplines used for the 
SED continues to evolve and is the most exhaustive of 
any of the surveys reviewed, with more than 300 
specialties, albeit not without debate about the most 
current and appropriate taxonomy. Specialty data are 
collected for each degree earned, the dissertation topic, 
the field of intended postdoctoral study, and the expected 
field of work. 

In addition to complete demographic data, the SED 
collects data on educational history, time to degree, 
financial support, and post-graduation plans. Numerous 
tables from the data are available on the Web and in the 
annual publication Science and Engineering Doctorate 
Awards . These degree data are also used by NSF for 
publications such as Science and Engineering Degrees . 
Science and Engineering Indicators , and Women. 
Minorities, and Persons with Disabilities in Science and 
Engineering . 

Aggregate data at the institution level are available on 
WebCASPAR for academic years 1965-66 through 1995- 
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96. No public use data files comparable to the GSS file 
are available, in part because of the need to protect 
confidentiality, but researchers may obtain a microdata 
license from NSF. 

The SED data on time to degree are an invaluable 
resource, although the survey does not account for periods 
of stop-out or part-time study. WebCASPAR includes 
SED data on mean and median times between completion 
of an undergraduate program and date of awarded 
doctorate, for every doctorate-granting institution and for 
every U.S. school of baccalaureate origin. Since one 
third of doctoral recipients are foreign, an extensive coding 
manual was developed to document foreign institutions, 
entitled Mapping the World of Education: The Comparative 
Database System: Mapping the World of Education: The 
Comparative Database System (Hunt, 1994). Various 
“Issue Briefs” and “Data Briefs" are prepared by NSF from 
the SED, for example Hill’s (1997) analysis “Doctorate 
Awards Increase in S&E Overall, but Computer Science 
Declines for First Time.” 

Data collected on post-doctoral plans are useful for 
employment analysis and faculty supply and demand, but 
these are based on graduates’ intentions, not necessarily 
the reality of job hunting. The Survey of Doctorate 
Recipients (SDR) is a more accurate predictor than the 
SED for estimating what percentage of Ph.D. recipients 
are likely to complete a post-doc. SED data on sector of 
planned employment are highly correlated with actual 
employment data from the SDR. The data on whether 
graduates plan to enter academe, government, or the 
private sector are useful, especially with the exhaustive 
specialty breakout. SED trends show dramatic shifts in 
the percent of Ph.D.s interested in academe. 

Researchers use the SED for studies about doctoral 
graduate characteristics and about the impact of various 
variables, such as funding, on time to degree. The SED 
also affords the institutional researcher critical data about 
peer comparisons. For example, it is possible to compare 
doctoral degree data in a much more complex manner 
than is possible with the IPEDS completions survey. 

(3) Survey of Doctorate Recipients (SDR) 

The SDR is a longitudinal survey that was initiated in 
1973. A biennial survey of the science and engineering 
doctorates in the U.S., new doctoral recipients are added 
each cycle and individuals older than 75 are dropped. 
The sample is drawn from the Doctorate Records File of 
the SED, with a sampling rate of approximately 1 to 12, 
with fifty thousand individuals surveyed in 1995. From 
1977 to 1995, the SDR included humanities doctorates. It 
is hoped that, with additional funding, these data will once 
again be collected. Information about the SDR is available 
on the Web at http://www.nsf.gov/sbe/srs/ssdr/. 

Initial data collection is done by mail, with follow-up by 
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computer-assisted telephone interviewing (CATI). The 
sample is stratified by field of degree, gender, race and 
ethnicity, disability, and U.S. versus foreign birth place. 
Data from the survey are weighted to the total S&E 
doctorate population in the U.S., using the Doctorate 
Records File. The National Research Council conducted 
the SDR until 1995. The National Opinion Research 
Center took over the data collection effort in 1997. The 
most recent data available are for 1997. 

In addition to the demographic and degree data already 
maintained in the SED, the SDR collects data about 
citizenship/country, disability status; work history; employer 
size; employment status (unemployed, part-time, or full- 
time); faculty rank; tenure status; geographic place of 
employment; labor force status; marital status; number of 
children; occupation; patent/publication activity; 
postdoctorate status; primary work activity (e.g., teaching, 
basic research, etc.); salary; previous year earnings; school 
enrollment status; and sector of employment (academia, 
industry, government). 

Numerous reports based on the SDR are published by 
NSF in print and on the Web in HTML, PDF, and Excel 
formats. The primary reports of SDR data are the biennial 
publications Characteristics of Doctoral Scientists and 
Engineers in the United States and Doctoral Scientists 
and Engineers in the United States: Profile . These data 
are also included in the publications Science and 
Engineering Indicators and Women. Minorities, and 
Persons with Disabilities in Science and Engineering . 
Various “Issue Briefs” and “Data Briefs” are also prepared 
by NSF from the SDR, for example Reget’s (1 997) analysis 
‘What’s Happening in the Labor Market for Recent Science 
and Engineering Ph.D. Recipients?” 

The Science and Engineering Data System (SESTAT) 
was developed to provide research access to the three 
survey files maintained by NSF - the SDR, the National 
Survey of Recent College Graduates (NSRCG), and the 
National Survey of College Graduates (NSCG). The 
individual or combined data files may be used for analysis 
of the S&E workforce. The 1995 SESTAT database 
includes 105,106 observations, including 35,370 from the 
SDR, 53,448 from the NSCG, and 1 6,338 from the NSRCG 
surveys. 

Researchers may access public SESTAT files either 
on the Web or by obtaining a microdata license for the 
complete file. Web access is offered with a simple 
registration form at http://sestat.nsf.gov/. Selected SESTAT 
tables and a data element dictionary are available online. 
These include extensive technical notes and frequencies 
of responses to each variable in the 1993 and 1995 files. 
The SESTAT variables are also organized by topic, 
keyword, and crosswalks between survey, question 
number, and SAS field name. 

A problem of the SDR for faculty studies is that data 
about current faculty employment is collected only by major 



postsecondary occupation codes. These occupational data 
lose the fine level of detail available in the coding for field 
of degree from the SED. Field of degree is sometimes 
used as if it were comparable to field of employment in the 
SDR. However, some Ph.D.s work outside of their field of 
doctoral degree (Burton and Parker, 1998). 

Any crosswalk between the SDR and WebCASPAR 
or other disciplinary taxonomies must be very simplistic, 
given the broad nature of the postsecondary occupation 
codes. Another problem is that, because it is based on 
the SED for its sample, the SDR excludes persons with 
professional degrees in the medical sciences, yet it 
surveys Ph.D.s in this area. The data on medical sciences 
are therefore incomplete, unless persons have for 
example received both the Ph.D. and the M.D. 

The SDR may be used to document the faculty 
population by gender, race, rank, and tenure for a sample 
of S&E higher education faculty, but only for the 29 
postsecondary occupation codes. An example of the 
limitations of this taxonomy may be seen in the grouping 
for “Life and Related Sciences,” which has only four 
occupation codes: 

282710 Postsecondary teachers - Agriculture; 

282730 Postsecondary teachers - Biological 
scientists; 

282870 Postsecondary teachers - Medical science; 
and 

282970 Other postsecondary teachers - Natural 
sciences. 

The SDR does allow for estimates of postdoctoral 
data, in and out of academe, that are not included in the 
GSS (Regets, 1998). It is possible to estimate the 
percentage of Ph.D. recipients entering post-docs by 
occupation and field of degree. These data are sometimes 
used for doctoral unemployment studies by discipline 
and industry. However, the SDR is inadequate for 
documenting the faculty population by detailed discipline, 
even for S&E faculty. 

Since the SDR is only a sample, it is not appropriate 
to estimate data at the institutional level. If SDR data are 
to be used for peer comparisons, it is necessary to 
aggregate the data up to a combination of educational 
institution type and control. Not all institutional variables 
such as Carnegie classification are readily available in 
the file. SDR microdata must remapped by institutional 
codes to a lookup table of Carnegie classification data. 
NSF staff are working to include more institutional 
identifiers for further analysis. 

The longitudinal component of the SDR collects age- 
related data useful for modeling faculty retirement and 
longitudinal data useful for modeling rank transitions. 
However, due to major survey design and instrument 
changes, users should consult with NSF staff before 
using SDR data longitudinally. 
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(4) National Survey of Recent College Graduates 
(NSRCG) 

This survey, administered by Westat, Inc. for NSF, 
gathers data about people who obtained a bachelor’s or 
master’s degree in science and engineering since 1 990. 
In contrast, the National Survey of College Graduates 
(NSCG) gathers comparable data on persons who obtained 
at least a bachelor’s degree prior to 1990. The NSRCG 
has been conducted approximately every two years since 
1976, with 1997 the most current available. Both surveys 
were designed to be similar in many respects to the SDR 
and the three make up NSF’s SESTAT system. Information 
about the NSRCG is available on the Web at http:// 
www.nsf.gov/sbe/srs/snsrcg/. 

The 1991-1992 Integrated Postsecondary Education 
Data System (IPEDS) survey data were used to construct 
the 1995 sample of schools. Of these, 102 were selected 
with certainty because of their size and the number of 
S&E degrees awarded. Another 173 other institutions 
were selected based on stratification by control, region, 
and the percent of degrees in S&E. Each institution 
provided a roster of graduates receiving the bachelor’s or 
master’s degree in an S&E field. 

Using the rosters, 1 3,893 bachelor’s and 7,1 07 master’s 
recipients from 275 institutions were surveyed in 1 995. 
These graduates were chosen based on stratification by 
year of degree, major, degree status, and Native American 
status. Data were primarily collected using CATI, with 
some data provided by mail for those who could not be 
contacted by phone. Of a total 21,000 sampled cases, 
there were 16,338 respondents for a response rate of 
83.2%, excluding those found ineligible. Best coding and 
sequential hot deck imputation were used as necessary 
for incongruent or missing data. 

The NSRCG should not be confused with the Recent 
College Graduates (RCG) survey conducted by NCES 
between 1976 and 1991. In 1993, NCES established a 
longitudinal survey of graduating college seniors which 
replaced the RCG Study. Baccalaureate and Beyond 
(B&B) is supposed to follow an “oversample of graduating 
seniors from the National Postsecondary Student Aid 
Study.” The B&B was designed to “determine how many 
graduates become eligible or qualified to teach for the first 
time and how many were employed as teachers in the 
year following graduation, by teaching.” The B&B is also 
designed to evaluate “the relationship between courses 
taken, student achievement, and occupational outcomes” 
(NCES, 1998, http://nces.ed.gov/surveys/b&b.html). 

Like the other SESTAT databases, the occupation field 
for faculty in the NSRCG is reported for 29 postsecondary 
occupations, and then only for persons with S&E degrees. 
The technical notes about the survey explain that “individuals 
do not always know the precise definitions of occupations 
that are used by experts in the field and may thus select 



occupational fields that are technically incorrect” (NSF, 
1997, http://www.nsf.gov/sbe/srs/nsf97333/secta.htm). The 
use of occupation codes was simplified between the 
1993 and 1995 surveys. Occupation codes were 
recognized as a problematic variable and best coding 
practices were used, resulting in two occupation codes 
for each respondent in the three surveys - reported and 
best code. 

The NSRCG survey provides a portrait of the faculty 
population by gender within race for the 29 possible 
postsecondary occupations, but only for each general 
type of academic institution. Since the survey is designed 
to reach new graduates, the results allow researchers to 
study the new faculty population in institutions which do 
not require a doctorate, such as community colleges. It 
is necessary to aggregate to the general type of 
educational institution where faculty are employed. 
Carnegie, control, rank, and tenure data are not available. 

Key variables which are documented include: 
citizenship; country of birth; country of citizenship; birth 
date; disability status; educational history; employment 
status (unemployed, employed part time, or employed full 
time); field of degree(s); field of study; geographic place 
of employment; labor force status; level of degree(s); 
marital status; number of children; occupation; occupation 
5 years ago; primary work activity (e.g., teaching, basic 
research, etc.); race and ethnicity; salary; school 
enrollment status; sector of employment (e.g., academia, 
industry, government, etc.); gender; and years of 
professional experience. 

Data from the NSRCG are used to produce the detailed 
statistical tables in the series Characteristics of Recent 
Science and Engineering Graduates , with 1995 the most 
recent available. Some of the NSF “Issue Briefs” and “Data 
Briefs” incorporate these data, for example Tsapogas’ 
(Forthcoming) analysis “Will Small Business Become the 
Nation’s Leading Employer of Grads with Bachelor’s 
Degrees in S&E?” 

The SESTAT system includes NSRCG data for 1993 
and 1995. Data from the April 1997 reference period are 
the most current available microdata. 

(5) National Survey of College Graduates (NSCG) 

This longitudinal survey is administered by the Bureau 
of the Census for NSF and gathers data about persons 
with education and/or employment in S&E. The 1995 
sample was selected from respondents to the 1 993 NSCG 
and the 1993 NSRCG. Information about the NSCG is 
available on the Web at http://www.nsf.gov/sbe/srs/snscg/. 

Key variables are identical to the NSRCG, listed above. 
On the public use file, the NSCG provides data on gender 
within race for the 29 possible postsecondary occupations 
by general type of educational institution. The data on 
age and other demographic and work-related variables 
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may be useful in assumptions and models about 
comprehensive, liberal arts, and two-year institutions where 
the Ph.D. is not required. Carnegie, control, rank, and 
tenure data are not available. 

The 1 995 sample was stratified by demographic group, 
highest degree, and gender. A mail survey was sent to 
61,891 individuals, with 47,912 respondents. 
Approximately 9,760 nonrespondents were sub-sampled 
for administration of CATI or computer assisted personal 
interview (CAPI) follow-up. There were a total of 53,348 
eligible respondents for a response rate of 86.2%. 

Several NSF publications incorporate data from the 
NSCG, including Science and Engineering Indicators and 
Women. Minorities, and Persons with Disabilities in 
Science and Engineering . SESTAT includes 1993 and 
1995 NSCG data. Data from April 1997 are the most 
current available. Like the NSRCG, this survey may be 
used to examine faculty microdata for faculty who do not 
have a doctorate. 

(6) NCES IPEDS Survey of Earned Degrees (C) 

The Integrated Postsecondary Education Data System 
(IPEDS) Completions Survey is an annual NCES survey 
of all postsecondary institutions in the U.S. and outlying 
areas. The survey documents degrees completed by 
level by race/ethnicity, gender, and Classification of 
Instructional Program (CIP) code. Information about the 
degrees conferred survey is available on the Web at 
http://nces.ed.gov/lpeds/completions.html. 

No weighting techniques are used, since the entire 
population is surveyed at the school level. Response 
rates range from 85% to 96%. Data on non-responding 
institutions are imputed from the previous year’s data in 
order to complete national estimates. Out of the universe 
of 6,698 institutions included in the third release of 1996- 
97 data, a total of 6,304 completed the survey for a 
response rate of 94.2%. 

The IPEDS completions data are available from a 
number of sources: 

(1) data from 1987 through 1995 are available on 
WebCASPAR at the institution level; 

(2) the raw data for 1989-90 to 1996-97 are also 
available for downloading from the NCES IPEDS 
Website. These data files include SAS read 
programs and Access file formats, along with 
data dictionaries and other documentation; 

(3) five years of historical data, updated annually, 
are available on the IPEDS CD-Rom, available 
each year free of charge from NCES; 

(4) it is now possible for users without CD-Rom 
access to run the IPEDS CD software remotely, 
using Citrix Winframe, at the Website 
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http://nces.ed.gov/citrix; 

(5) aggregate degree data are offered by the NCES 
IPEDS Interactive Database Search Website 
http://nces.ed.goV/IPEDSEARLYRELEASE/.The 
most current data available are 1995-96. Users 
need to have the survey form available to interpret 
line numbers and subtotals, but current copies of 
forms are available on the site; 

(6) through the NCES National Data Resource 
Center, which was established to meet the needs 
of education officials and policy analysts to “obtain 
special statistical tabulations and analyses of data 
sets maintained by NCES." Before the data were 
available for FTP, users had to go to NDRC to get 
diskettes or tapes of the surveys; 

(7) commercially available from John Minter Associates 
on the “Higher Education Data CD 99” with nine years 
of data. See http://www.edmin.com/jma/page2.html; 
and 

(8) Dr. Vic Borden and Tim Thomas at IUPUI received 
an NSF/NCES/AIR research grant to study “forms 
and formats for delivering information derived from 
national IPEDS data sets.” They plan to make 
degree completions data available on the Web. 
See http://www.imir.iupui.edu for details. 

Using the completions survey data, it is possible to 
document the number of doctoral graduates by gender 
within race by CIP code at each institution. The data on 
master’s degrees may be useful for some models of 
doctoral enrollment demand and for predicting faculty 
supply for community and two-year colleges. 

(7) NCES IPEDS Survey of Salaries, Tenure, and 
Fringe Benefits of Full-time Instructional 
Faculty (SA) 

The IPEDS SA collects aggregate salary and fringe 
benefits data by institution on full-time, instructional faculty, 
with breakouts by gender, rank, tenure, and contract length. 
Information about the SA is available on the Web at http:/ 
/nces.ed.gov/lpeds/facultysalaries.html. 

SA data are available from numerous sources, among 
them: 

(1) salary data from 1971 to 1995 and fringe benefit 
data from 1977 to 1995 are available on 
WebCASPAR; 

(2) 1989-90 to 1997-98 files are available for 
download from the NCES Website; 

(3) the IPEDS CD-Rom (five years of historical data); 

(4) the NCES Interactive IPEDS Database search 
site (1996-97 most current); 
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(5) the IPEDS Interactive Database at Arizona State 
University, developed by Gene Glass and 
available at http://129.219.88.111/ipeds/. Data 
from 1994-95 to 1996-97 are currently provided 
in a format designed for peer comparison; 

(6) the NCES National Data Resource Center; and 

(7) commercially available from John Minter 
Associates “Faculty Salary CD” with 1997-98 
data, including benchmarks of average faculty 
salary trends from 1992 to 1997. See 
http://www.edmin.com/jma/sa97web.htm for 
more information. The “Higher Education Data 
CD 99” provides nine years of SA data. 

All of the institutional identifiers such as Carnegie 
classification, control, and religious affiliation which are 
collected in the IPEDS Institutional Characteristics survey 
are also provided in these datasets. Approximately 3,907 
institutions were included in the universe in 1 997-98, with 
3,647 respondents for a response rate of 93.4%. 

It is important to note that the SA includes only full-time 
instructional faculty. It does not include faculty whose 
duties are 50% or more in research, service, or 
administration. For this reason, the SA is the best estimate 
of the total full-time faculty teaching population. Historical 
SA data are useful for tracking the growth of non-tenure 
track positions by type of institution. The SA is comparable 
to the American Association of University Professors 
(AAUP) faculty survey in its collection of salary and benefits 
expenditures and may be substituted for the AAUP 
submission. The SA does not collect benefits data by 
rank, only by contract length. 

Users must be careful in the calculation of salary 
averages using the SA. These may result in differences 
with those collected by the AAUP. In calculating average 
salaries from sources such as WebCASPAR and the raw 
data files, users must equate 11/12 month salaries to 9/ 
10 month contract length (multiply by .81818) to have 
results comparable to AAUP. Also, NCES suppresses 
cells in the data which contain three or fewer individuals. 
For submissions that include these cells, the AAUP and 
NCES calculations of average salaries will always differ. 

For documenting the faculty population, the SA provides 
aggregate data on gender within rank within tenure at the 
institutional level. Along with the IPEDS Fall Staff Survey 
(S), the SA may be used as a population estimate of 
faculty totals by Carnegie classification and control. These 
data may also be used as validity checks for other 
estimates of the total, full-time, instructional faculty 
population, such as sampled by the National Study of 
Postsecondary Faculty (NSOPF). 

(8) NCES IPEDS Fall Staff Survey (S) 

In 1993, the IPEDS Fall Staff survey (S) replaced the 
EEO-6 survey administered by the Equal Employment 



Opportunity Commission. Prior to this, both surveys collected 
data on higher education full- and part-time faculty and staff 
biennially in odd-numbered years. The S documents every 
other year the number of full- and part-time staff by EEO 
category (occupational activity), gender within race/ethnicity, 
and salary range. It also includes a version of the SA form, 
broken out by gender within race/ethnicity for all full-time 
instructional, research, and public service faculty. Information 
about the S is available on the Web at http://nces.ed.gov/ 
Ipeds/fallstaff.html. 

The 1997 IPEDS S survey includes 6,777 
postsecondary schools, with 6,194 respondents for a 
response rate of 91.4%. Data are imputed for missing 
schools, based on previous years’ submissions. 

Staffing data are available from several sources: 

(1) biennial files from 1991 to 1997 are available for 
download at the NCES Website; 

(2) the IPEDS CD-Rom, with historical data; 

(3) the NCES National Data Resource Center; and 

(4) commercially available from John Minter 
Associates “Staff 98 CD” with 1 997-98 data. See 
http://www.edstats.net/staff98cd/ for more 
information. The “Higher Education Data CD 99” 
provides five biennial surveys worth of S data. 

The bulk of the IPEDS S is devoted to collecting data 
on the broad occupational categories developed by the 
EEOC for affirmative action reporting. These include: 
executive/administrative/managerial, other professionals 
(support/service), technical and paraprofessionals, clerical 
and secretarial, skilled craft, and maintenance. Data are 
collected on part-time employees, including part-time 
faculty (with teaching, research, and service combined). 
Another table of information about new hires is collected, 
broken out for total, full-time, instructional, research, and 
service faculty by gender and ethnicity. Unfortunately, 
these are not further broken out by tenure status or rank. 
These data on hiring reflect the only national population 
source on the number of part-time employees and the 
number of new faculty hired by the universe of institutions. 

In documenting the faculty population, the IPEDS S 
provides aggregate, institutional data by gender within 
ethnicity by rank within tenure. The data on new hires is 
useful in predicting an annual growth rate by institution. It 
is also possible to take data by institution from the S, 
subtract data from the SA, and estimate the number of 
full-time research and public service faculty, something 
not collected directly by either survey. Tenure and rank 
issues for research and service faculty may also be 
analyzed using this comparison between S and SA data. 

While the SA report is often completed by institutional 
research staff, and like the AAUP survey is central to 
national analyses of faculty compensation, the S is 
sometimes not given as much analytical scrutiny in its 
preparation. The survey may be compiled by human resources 
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office staff, with different dates and selection criteria than those 
used for the SA. Therefore, any analysis of the relationship 
between S and SA data must be done cautiously. 

(9) CUPA National Faculty Salary Survey by 
Discipline and Rank 

The College and University Personnel Association 
(CUPA) collects two faculty salary surveys, one for public 
and one for private institutions. The public version was 
first piloted in 1981 and the private version followed the 
next year. The survey collects data on the number and 
salaries of faculty by clusters of CIP code-level disciplines. 
Information about the CUPA survey is available on the 
Web at http://www.cupa.org/cbsurvey/compbene.htm. 

A total of 357 public institutions completed the CUPA 
survey for 1997-98, documenting a population of 
approximately 1 10,000 faculty. Many of the participating 
institutions are members of the American Association of 
State Colleges and Universities (AASCU). A total of 544 
private institutions participated, documenting a population 
of approximately 60,000 faculty. 

The survey has historically collected data by gender 
and rank within discipline, with minimum, maximum, and 
average salaries for full-time, instructional faculty. 
Recently with the 1 998-99 survey, CUPA started to collect 
data on three levels of non-ranked researchers. Research 
I positions are comparable to postdoctoral fellows and 
these data may be very helpful in supplementing those 
from the GSS and SDR. Research II positions are 
intermediate level research scientists, research 
associates, or research engineers who contribute 
significantly to project activities with independent research. 
These may be comparable to non-faculty research staff 
listed on the GSS. Senior level Research III positions 
such as senior research scientist or senior research 
engineer are responsible for research projects, usually 
hold an advanced degree, and have four or more years 
of “high-level research experience.” 

Only data that fit into the survey’s unique combinations 
of CIP code taxonomy are collected, so the results may 
not be used as an estimate of the total faculty population 
at participating institutions. 

The survey is administered by Richard D. Howe at 
Appalachian State University and analyzed by the 
Oklahoma State University institutional research office, 
which also administers the Oklahoma State Faculty Salary 
Survey. A data book is published each year. Participating 
institutions may purchase customized studies with special 
data tabulations for $250. These studies do not include 
institutional identifiers. However, it is always possible to 
request individual CUPA survey submissions from 
institutions directly. 

Data are also collected for new assistant professors 
as a subset of the assistant professor data. These are 
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very useful for estimating the number of new assistant 
professor hires by discipline to document benchmarks of 
the salary marketplace. For example, the data may be 
used to verify issues of salary compression, illustrating 
disparities between the salaries of new and existing faculty 
in a given discipline. 

The sample size does not permit extensive 
extrapolation. The CUPA publication arrays salary 
averages by Carnegie classification, control, and region. 
Participating schools also receive benchmark analyses of 
their data versus national averages by CIP code. The data 
on new assistant professors is useful in making 
assumptions about the number of new hires by discipline. 
These data are often used to document equity issues in 
faculty salaries by discipline for men and women. 

(10) Oklahoma State Faculty Salary Survey 

The annual Oklahoma State Faculty Salary Survey is 
comparable to the CUPA survey, but is expanded to all 
CIP codes used by participating institutions. It is limited 
to members of the National Association of State Universities 
and Land Grant Colleges (NASULGC) who award 
doctorates in at least five different areas. Eighty-four 
schools participated in 1997-98. The survey is 
administered by the institutional research office at 
Oklahoma State University. No information is currently 
available on the Web. 

Annual data are collected electronically for full-time 
instructional faculty by gender and rank, with a breakout 
for new assistant professors. Every other year, ethnicity 
data are also collected, with approximately 65 institutions 
participating. In order to be listed in the analytical reports 
which aggregate data by discipline, a CIP code must be 
used by more than a few institutions. If there is no match, 
the data are rolled into the other (99) version of the four 
digit CIP code and then, if necessary, aggregated to the 
(01) version of the CIP code. 

Data reports are provided to participating institutions, 
but without institutional identifiers. Institutional research 
offices may request special studies of their peers at a cost 
of $120. This allows them to weight the data to match 
their own profile of disciplines. Reports on faculty salaries 
by CIP code are published at a cost of $60, though 
participants receive a free copy. The “Distribution Study” 
is published every two years with the ethnicity data, with 
the 1997-98 data published in Summer, 1998. Some 
institutions exchange data files with their peers. A subset 
of institutions is routinely analyzed by the University of 
Alabama for the Southern University Group (SUG). 

For documenting the faculty population, the Oklahoma 
State survey provides data on gender within ethnicity by 
rank at the CIP code discipline level, but only for a 
relatively small, somewhat homogeneous sample of land 
grant institutions. This is still a critical source for setting 
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faculty availability statistics by discipline, gender, and 
ethnicity for the eight factor analyses for affirmative action 
reporting. 

(11) National Study of Postsecondary Faculty 
(NSOPF) 

The NSOPF survey was conducted in 1988 and in 1993 
by NCES, with support from NSF and the National 
Endowment for the Humanities, and will be administered 
again in 1999. In 1993, institutional and faculty versions of 
the survey were collected. A department chair survey was 
administered in 1988. The 1993 NSOPF was administered by 
the National Opinion Research Center (NORC) at the University 
of Chicago. Information about the NSOPF is available on the 
Web at http://nces.ed.gov/surveys/nsopf.html. 

The NSOPF is the primary national survey of faculty 
activities, demographics, and attitudes. A two stage 
sampling procedure was used for the faculty questionnaire. 
First, 974 institutions were contacted, of which 81 7 agreed 
to participate. These institutions provided lists of faculty 
by discipline. Limited disciplinary data were keyed in 
order to over-sample four NEH disciplines. The sampling 
rate was also increased for full-time women and minorities. 
From the lists, samples with a measure of size of 41 .5 
faculty (41 or 42) per institution were developed, stratified 
by Carnegie classification and control. Most public and 
private research universities and most public doctoral 
universities were included (with certainty) in the sample. 
A total of 25,780 surveys were completed for a response 
rate of 86.6%. 

In analyzing the NSOPF data for 1988 and 1993, 
anomalies were detected in the number of part-time and 
health science faculty. The initial Data Analysis System 
(DAS) and analyses were revised and re-released after it 
was determined that the survey was not adequately 
administered to medical school faculty and that the weights 
of part-time faculty were incorrect due to problems in the 
institutional lists. The part-time issue has been corrected, 
but NSOPF still under-reports health science faculty. 

While data on discipline were collected with 149 possible 
fields, the sample was not stratified by discipline. For this 
reason and because of the problems with health sciences, 
the data should not be interpreted for population estimates 
by discipline, except by the broadest clusters of disciplines. 
For all other uses, such as comparisons of faculty workload, 
the discipline-specific data are sufficient. 

The NSOPF data are available from these sources: 

(1) in a data analysis system (DAS) on CD-Rom 
from NCES. The CD also includes data on most 
other non-IPEDS surveys administered through 
NCES, each with its own DAS. The Window- 
based software allows filtered, two dimensional 
cross-tabs. Two versions of the software are 
provided, one for regular tables and one that 
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produces correlation matrices for further analysis 
in SAS or SPSS. The software produces a tab 
delimited text file with information on weights, cell 
counts, and standard errors; 

(2) DAS on the Web, an Internet-based version of 
the DAS. Users download and install the software, 
then upload/submit a table parameter file (TPF) 
to run queries. The DAS Website processes the 
TPF, generates a table, and the user picks up a 
PRN file with the results from an FTP directory. 
The Web DAS is kept current with recodes and 
new data administration, while the CD only 
documents the file at a point in time; 

(3) microdata are available for controlled use under 
licensing agreements with NCES; 

(4) the National Data Resource Center is able to 
produce data tables from the NSOPF if the 
required analysis cannot be easily obtained with 
the DAS. 

Numerous reports and studies of the NSOPF data are now 
available, including Faculty and Instructional Staff: Who Are 
They and What Do They Do? and Institutional Policies and 
Practices Regarding Faculty in Higher Education . A list of 
publications, most available online in PDF format, is provided at 
http://nces.ed.gov/pubsearch/getpubcats.idc?sid=01 1 . 

It is important to note that the definition of faculty used 
for the NSOPF differs from that of the IPEDS S and SA. 
In order to gather data on all types of teachers, the 
institutional lists included full-time, part-time, permanent, 
and temporary instructional faculty and staff, along with 
non-instructional faculty. This is an important source of 
information on part-time and temporary staff. However, 
the reader must be careful in interpreting tables of NSOPF 
data to ensure that the correct faculty definition is used. 

In weighting the sample to the population, NORC first 
weighted the respondents by institutional type to the lists 
from institutions (approximately 500,000 faculty names). 
These data were then weighted again by institution to the 
17 possible strata of Carnegie classification and control 
and the total faculty population as documented in the 
IPEDS S. The number of strata is uneven because there 
are no public, religious institutions. 

The institutional survey gathers information about 
instructional and non-instructional faculty hires, retirements, 
tenure policies, benefits, evaluation procedures, and 
downsizing for all types of faculty. Totals of instructional and 
non-instructional tenured and tenure track faculty were 
collected for Fall 1991 and Fall 1992 by institution. Faculty 
eligible for and granted tenure are documented. These 
types of data are extremely valuable in making assumptions 
about faculty mobility and the impact of certain policies such 
as early retirement programs. 

Final technical reviews have been completed for the 
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1999 NSOPF survey. According to NCES staff, the 
institutional lists will be sorted by discipline and each 
institutional sample size will vary in order to better estimate 
the population by discipline. 

(12) HERI Faculty Survey 

The Faculty Survey administered by the Higher Education 
Research Institute (HERI) of the University of California - 
Los Angeles is very similar to the NSOPF in its focus on 
faculty demographics, activities, and attitudes. Information 
about the HERI survey is available on the Web at http:// 
www.gseis.ucla.edu/heri/facultysurvey98.html. 

The survey administered in 1995 included 384 
institutions and 33,986 respondents, for an overall 
response rate of 42%. Over the course of six surveys 
conducted since 1969, HERI has gathered data on 
500,000 faculty at 1,000 institutions. The survey was 
administered again in 1998-99. This survey is an 
invitational sample and HERI charges institutions a fee, 
similar to the administration of the UCLA CIRP Freshmen 
survey. For 1998-99, the institutional cost was $325 plus 
$3.25 per returned survey. 

For the purpose of the survey, faculty are defined broadly. 
Depending upon whom institutions chose to sample, the 
survey includes employees who teach undergraduates, 
full-time administrators, full-time researchers, and faculty 
who teach only at the graduate level. 

The publication The American College Teacher: 
National Norms for the 1995-96 HERI Faculty Survey 
reports the results of this survey and is sent to participating 
institutions. The book is also available for purchase from 
HERI. The HERI sends a standard set of cross-tabs of 
the data to institutions and will prepare additional analysis 
of the data for a fee. 

“National Norms” were developed based on the portion 
of respondents who code themselves as undergraduate 
teaching faculty. The norms include all institutions which 
surveyed a minimum percentage of their faculty 
population, as determined from analysis of IPEDS reports. 
The list of participating institutions was examined “using 
a 23 cell stratification based on institutional type, 
selectivity, and control” (Sax et al, 1996, p. 1). The 
sample was supplemented with 21 randomly selected 
institutions for the cells with low counts. The participation 
of 22 additional institutions was supplemented with funding 
from the Corporation for National Service. 

For documenting the faculty population, the HERI 
faculty questionnaire provides data on gender within 
ethnicity by rank within tenure status by Carnegie 
classification and control. The instrument does not 
permit coding as non-tenure track, only if and when 
tenure was awarded. Many types of faculty policy 
questions may be addressed. It is particularly interesting 
to example changes in the data over time. 



(13) Doctoral Program Rankings - 1995 

The National Research Council (NRC) collected data 
on faculty as part of its doctoral program rankings project 
in 1982 and 1995. Information about the NRC Doctoral 
Rankings project is available on the Web at 
http://www.nap.edu/readingroom/books/researchdoc/. 

For the 1 995 study, the NRC gathered data on 41 fields 
selected because of three factors: the number of Ph.D.s 
produced nationally, the number of programs training 
Ph.D.s within a particular field, and the average number 
of Ph.D.s produced per program. 

Based on reports from Institutional Coordinators (ICs) 
who provided information about their programs, 3,634 
research-doctorate programs at 274 U .S. universities were 
targeted in the 1995 project. Of these, 105 were private 
and 169 were public institutions. Data on specific faculty 
were taken from “various sources of information,” including 
the Doctorate Records File of SED data. Using the 
combination of 1C reports and faculty survey instruments, 
data were gathered about program ratings, Ph.D. 
recipients, women and minority enrollment and degree 
patterns, and the number of faculty. 

Sources for data on the N RC Doctoral Program rankings 
include: 

(1) the book Research-Doctorate Programs in the 
United States: Continuity and Change (NRC, 
1995). 

(2) an executive summary of the book, along with 
HTML versions and Excel spreadsheets of key 
tables which are available online; 

(3) a CD-Rom which is available for purchase and 
includes all data, including faculty names, used in 
the study; and 

(4) WebCASPAR includes the 1 982 NRC data, which 
were actually collected in 1 980, with publication 
data from even earlier. Hopefully, NSF will add 
the 1995 data soon. 

It is possible to document the faculty population by 
discipline for research programs, using the NRC data. 
Massy and Goldman (1995) did this using the 1980 NRC 
data for their classic study The Production and Utilization 
of Science and Engineering Doctorates in the United 
States . Data about gender, ethnicity, and tenure status of 
faculty are not collected. The NRC data allow users to 
get a portrait of research programs by discipline by 
institution. It is an excellent source for many kinds of 
benchmarks of research productivity by discipline. 

(14) AAUP Faculty Compensation Survey 

The American Association of University Professors 
(AAUP) collects data similar in many ways to those 
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collected with the IPEDS SA. These include aggregate, 
institutional data on salaries, fringe benefits, and the 
headcount number of full-time instructional faculty by 
gender, rank, and contract length. In addition, the AAUP 
survey gathers data on salaries and percentage increases 
for continuing instructional faculty, allowing it to calculate 
yearly trends in faculty salaries. The AAUP also collects 
benefits data by rank and contract length. Information 
about the AAUP survey is available on the Web at 
http://www.igc.apc.org/aaup/indexfcs.htm. 

Data from the survey are published in the March/April 
issue of the magazine Academe as the “AAUP Annual 
Report on the Economic Status of the Profession.” The 
report details salary increases against inflation, analyzes 
geographic differences, compares salaries by Carnegie 
classification, and focuses on gender-based salary 
disparities. Over 2,600 institutions are included in the 
1997-98 analysis, in contrast to the 3,907 schools 
surveyed by the IPEDS SA. The data are also published 
in print and electronically by the Chronicle of Higher 
Education . 

Custom faculty data may be ordered from AAUP for 
peer comparisons. Seven different items of data are offered, 
at a cost of between two and six dollars per school per item. 
Institutional identifiers are provided. Since the bulk of the 
survey data are identical to the SA, the AAUP would normally 
offer little of new interest for researchers except the salary 
increase comparison. However, the AAUP data are 
released in March/April, months before the first release of 
SA data. Due to this timing, the AAUP is the primary 
source for the national dialogue about faculty salaries and 
compensation and the data are widely cited in the media. 
The regional and Carnegie breakouts allow for various 
levels of peer comparisons. 

The AAUP data may be used for many classic studies 
of faculty, such as the demise of tenure, reliance on non- 
tenure track faculty, gender equity, and salary 
compression. 

(15) NCES IPEDS Fall Enrollment Survey (EF) 

This survey collects enrollment data for every 
postsecondary institution eligible to participate in Title IV 
financial aid programs. Institution-specific data are collected 
by race and ethnicity, gender, degree level, full-/part-time 
status, and year of study. Enrollment data by age and 
residency status are also collected as biennial components 
of the survey. Information about the EF is available on the 
Web at http://nces.ed.gov/lpeds/fallenrollment.html. 

The Fall 1997 IPEDS EF survey included 6,645 
postsecondary schools, with 6,278 respondents for a 
response rate of 94.5%. Data were imputed for missing 
schools, based on previous years’ submissions, in order 
to make national enrollment estimates and projections. 

EF data are available from these sources: 



(1 ) files from Fall 1 988 to Fall 1 997 are available for 
download at the NCES Website with extensive 
documentation; 

(2) the IPEDS CD-Rom, with five years historical 
data; 

(3) the IPEDS Interactive Database Search; 

(4) the NCES National Data Resource Center; 

(5) on WebCASPAR, with data from 1 966 to 1 995, 
broken into three files for opening enrollment, 
age, and residency; 

(6) commercially available from John Minter 
Associates on the “Higher Education Data CD 
99” with nine years of data. See 
http://www.edmin.com/jma/page2.html for more 
information; 

(7) ethnicity data for the 1 993 Fall enrollment file are 
available on the IPEDS Interactive Database at 
Arizona State University; and 

(8) Dr. Vic Borden and Tim Thomas at IUPUI plan to 
make enrollment data available on the Web. See 
http://www.imir.iupui.edu for details. 

The U.S. Department of Education uses the EF data 
for program planning; setting funding allocation standards 
for loan, work-study, and grant programs; and for updating 
enrollment projections. The data are also used for the 
NCES publications Condition of Education and the Digest 
of Education Statistics . According to NCES, many other 
federal and state agencies rely on these data for “economic 
and financial planning, manpower forecasting, and policy 
formulation” (NCES, 1998, http://nces.ed.gov/lpeds/ 
fallenrollment.html). 

For faculty studies, the most common use of the EF is 
to calculate faculty workload ratios. For this purpose, 
headcount is inadequate and full-time equivalent (FTE) 
student is appropriate. The NCES has shared its method 
for calculating FTE from headcount using the EF and this 
is widely adopted among institutional researchers. The 
number of full-time students is equated to FTE without 
any calculation. The part-time student headcount is divided 
by three. Full- and part-time FTE are added together to 
estimate the total student FTE and it is this student FTE 
figure which is used for many ratios of funding and faculty 
workload. This method is particularly problematic for 
community colleges and urban institutions which enroll a 
large number of part-time students. 

(16) NCES IPEDS Institutional Characteristics 
Survey (1C) 

This survey is the foundation of the IPEDS system, 
collecting data such as tuition, room and board, 
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unduplicated headcount, and institutional address. Data 
about location, Carnegie classification, highest degree 
awarded, existence of medical school, tribal college status, 
and Historically Black College or University status (HBCU) 
are also documented on the file. Information about the 
1C is available on the Web at http://nces.ed.gov/lpeds/ 
ic.html. 

The 1997-98 IPEDS 1C universe included 9,896 
postsecondary schools, with 8,921 respondents for a 
response rate of 90.1%. 

1C data are available from several sources: 

(1) files for 1989-90 to 1997-98 are available for 
download at the NCES Website with extensive 
documentation; 

(2) the IPEDS CD-Rom, with five years historical 
data; and 

(3) incorporated into the other IPEDS datasets on 
WebCASPAR, allowing for selection by various 
categories of institutions, such as Carnegie 
classification and control, or by the identifiers 
FICE code and UNHID. 

For faculty studies, the most common use of the 1C is 
to calculate student FTE data for faculty workload ratios. 
If the NCES calculation from the EF is inadequate, another 
method is to use the student credit hour (SCH) data 
collected on the 1C. For internal purposes, many 
institutions calculate FTE based upon credit hour activity, 
using agreed upon conventions such as 1 5 undergraduate 
credit hours equals one FTE or 12 graduate credit hours 
equals one FTE. There are different opinions on 
calculating FTE from SCH for graduate and professional 
activity, such as whether to divide by 12 or 15 SCH. It 
is important to be consistent in this application. 

However, NCES staff report that the SCH data reported 
on the 1C are inconsistent and not clean enough for 
national use. One topic for discussion in the redesign of 
the IPEDS forms is the possibility of collecting student 
FTE directly on the 1C survey, along with SCH. There are 
other problems in comparing the 1C data, such as different 
extract dates. 

(17) NCES IPEDS Finance Survey (F) 

This survey collects data on revenues, expenditures, 
scholarships, plant debt, plant assets, endowment, fund 
balances, and hospitals by institution. The finance survey 
is, in many ways, the most complex, misunderstood, and 
potentially fruitful survey in the IPEDS system for peer 
comparison purposes. Information about the F is available 
on the Web at http://nces.ed.gov/lpeds/finance.html. 

Traditionally, the dissemination of IPEDS files does not 
differentiate between public and private control. With 
the establishment of new accounting standards from 
Financial Accounting Standards Board (FASB) and 
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Governmental Accounting Standards Board (GASB), 
different editing and data administration procedures are 
necessary for public and private institutions. As of 
February 1999, only the public version of the 1996-97 
IPEDS F has been released. From a universe of 1,802 
public postsecondary schools, there were 1,714 
respondents for a response rate of 95. 1 %. For the 1 995-96 
data, the total number of private and public schools in the 
universe was 3,965, with 3,520 respondents for a response 
rate of 88.8% 

Finance data are available from several sources: 

(1) files from 1988-89 to 1996-97 are available for 
download at the NCES Website; 

(2) the IPEDS CD-Rom, with five years of historical 
data; 

(3) the NCES National Data Resource Center; 

(4) on WebCASPAR, with data from 1 965-66 to 1 994- 
95, broken into nine subsets; and 

(5) commercially available in different CD products 
from John Minter Associates, including Higher 
Education Data 99; Profiles of Campus Services, 
Resources, and Budget; Management Ratios #1 3; 
Finance Survey 1997; and Financial Ratio Norms 
for Independent Institutions. 

For faculty studies, there are several critical ratios 
using the IPEDS finance data. These usually involve 
combinations of instructional expenditures, research 
expenditures, and library expenditures per type of full- 
time faculty (using the S and/or SA). 

Conclusion 

Clearly, the national datasets have much to offer policy 
analysis, institutional research, peer comparisons, and 
other types of research about faculty. The 1 1 datasets 
about faculty, two datasets about student enrollment, two 
datasets about degrees, and two datasets about 
institutional activity represent a wealth of information, rich 
for mining on a myriad of questions. 

Potential users should not be daunted by the complexity 
of these surveys. While there is a natural learning curve 
with understanding the data elements, value labels, 
instrumentation, and report structures of each survey, this is 
really an issue of learning to use any dataset well. What are 
the meaningful ways to group, sort, aggregate, merge, 
recode, and query data? Once users begin to think this way 
about data, it is much easier to approach a new dataset and 
explore ways in which is could be used. 

As documented in the reviews, all of the datasets are 
available for free in some census date and format except for 
the CUPA, AAUP, HERI, and Oklahoma State surveys. 
With the NCES Interactive IPEDS Database and 
WebCASPAR, analysts have user-friendly and relatively 
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quick methods to conduct institutional or other types of 
comparisons. The DAS on the Web for NSOPF and the 
online version of SESTAT are more complicated to use, but 
represent a significant improvement over requirements for 
license agreements and SAS/SPSS programming. For 
those who prefer the SAS environment or Microsoft Access, 
there is extensive documentation from NCES for its data 
files. 

In reviewing the datasets, some brief examples were 
described about ways in which they might be used for 
policy analysis. Virtually any area of faculty research may 
be examined with these data. For studies of faculty 
retirement, the NSOPF and SDR are valuable resources. 
Basic ratios of faculty workload may be calculated using 
the IPEDS S, SA, and EF survey data. More complex 
analysis of workload may be conducted with the NSOPF 
and HERI. Campus surveys of faculty may be 
benchmarked against questions in the national datasets. 

Table 2 outlines typical research topics and potential 
data sources (Appendix B). This listing is not exhaustive 
and users may find many new ways to use the datasets for 
their purposes. The table is prepared as a guide to where 
to begin looking for data to respond to broad policy 
questions. 

This evolution in the dissemination and diffusion of the 
national datasets is possible because of significant efforts 
by NCES and NSF to improve access to the data. The 
National Postsecondary Education Cooperative with its 
goal of “Better Decisions Through Better Data” is also 
critical to this effort. Another important effort is the program 
of research grants and institutes titled “Improving Institutional 
Research in Postsecondary Educational Institutions.” 
Managed by the Association for Institutional Research, 
with financial support from NCES and NSF, one of the 
goals of this project is to “foster the use of the federal data 
bases to inform researchers on institutional research in 
postsecondary education” (AIR, 1998, http://ainweb.org/ 
GDRES99.html). 
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APPENDIX A 

Table 1: Datasets by Availability 



Datasets 


Source 


WebCASPAR 


IPEDS 

Interactive 


SESTAT 


FTP 


Free 


Print 


CD 


License 


For 

Sale 


Other Web 


Faculty 
























S-IPEDS 


NCES 








X 


X 


X 


X 




X 




SA - IPEDS 


NCES 


X 


X 




X 


X 


X 


X 




X 


X 


CUPA 


CUPA 












X 






X 




OK 


OK 












X 






X 




AAUP 


AAUP 












X 






X 




NSOPF 


NCES 










X 


X 


X 


X 




X 


HERI 


HERI 












X 






X 




SDR 


NSF 






X 




X 


X 




X 






NRC 


NRC 


X 










X 


X 






X 


NSCG 


NSF 






X 




X 


X 




X 






NSRCG 


NSF 






X 




X 


X 




X 






























Enrollment 
























EF- IPEDS 


NCES 


X 


X 




X 


X 


X 


X 




X 


X 


GSS 


NSF 


X 






X 


X 


X 


































Degrees 
























SED 


NSF 


X 








X 


X 




X 






C- IPEDS 


NCES 


X 


X 




X 


X 


X 


X 




X 


X 


























Institutional 

activity 
























1C -IPEDS 


NCES 


X 






X 


X 


X 


X 




X 




F- IPEDS 


NCES 


X 






X 


X 


X 


X 




X 
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APPENDIX B 

Table 2: Research Topics by Dataset 



Research topic 


CUPA 


OK 


AAUP 


NSOPF 


HERI 


SDR 


NRC 


NSCG 


NSRCG 


GSS 


SED 


c 


EF 


1C 


F 


i 


I SA 


Access 


X 


X 


X 


X 


X 


X 




X 


X 


X 


X 


X 


X 


X 




X 


X 


Administrative Overhead 






























X 


X 




Attitudes 








X 


X 


























Availability Statistics 


X 


X 


X 
















X 


X 








X 


X 


Demise of Tenure 






X 


X 




X 




















X 


X 


Expenditure Ratios (FTE) 


























X 


X 


X 




X 


Faculty Careers 








X 


X 


X 




X 


X 




X 














Faculty Demographics 


X 


X 


X 


X 


X 


X 




X 


X 




X 


X 








X 


X 


Faculty Rank Mobility 








X 


X 


X 


















1 






Faculty Compensation 


X 


X 


X 


X 




X 




X 


X 












X 


{ 

X 


X 


Faculty Supply & 
Demand 


X 


X 


X 


X 


X 


X 


X 


X 


X 


X 


X 


X 


X 


X 


X 


X 


X 


Faculty Work 








X 


X 


X 


X 


X 


X 


















Federal Funding Issues 












X 


X 






X 


X 








X 






Graduate Ed Changes 








X 




X 








X 


X 


X 


X 










New Faculty Hires 


X 


X 








X 










X 










X 




Non-Doctoral Faculty 








X 


X 






X 


X 


















Non-Faculty Research 
Staff 




















X 
















Part-Time Faculty 








X 


X 


X 




X 


X 














x 




Ph.D. Production 




















X 


X 


X 


X 










Pipeline Studies 


X 


X 








X 




X 


X 


X 


X 


X 


X 






X 


X 


Postdoctorates 


X 






X 




X 








X 


X 














Program Rankings 














X 






X 


X 


X 


X 










Research/Instructional 

Assistants 
































X 




Research/Service 

Faculty 








X 


X 


X 


X 


X 
















X 




Research Productivity 








X 


X 


X 


X 
















X 






Retirement 








X 


X 


X 




X 


X 


















Salary Compression 
(disc) 


X 


X 


f 


X 




X 




X 


X 


















Salary Equity 


X 


X 


X 


X 




X 




X 


X 














X 


X 


Staffing Patterns 
































X 




Time to Degree 








X 


X 


X 




X 


X 




X 














Workload - Complex 








X 


X 


























Workload - Ratios 








X 


X 


X 












X 


X 


X 




X 


X 
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